Tectogrammatical Annotation of the Wall Street Journal

نویسندگان

  • Silvie Cinková
  • Josef Toman
  • Jan Hajic
  • Kristýna Cermáková
  • Václav Klimes
  • Lucie Mladová
  • Jana Sindlerová
  • Kristýna Tomsu
  • Zdenek Zabokrtský
چکیده

This paper gives an overview of the current state of the Prague English Dependency Treebank project. It is an updated version of a draft text that was released along with a CD presenting the first 25% of the PDT-like version of the Penn Treebank – WSJ section (PEDT 1.0). Before the January 2009 release, the conversion from the original phrase structure trees into dependency trees as well as the consistency checks were substantially enhanced to savemanual work. The conversion is partly performed by scripted rules and partly by a statistical parser. To make the rules more powerful, the phrase-based Penn Treebank – WSJ was enriched with other publicly available language resources – the manual annotation of flat noun phrases and the named-entity and coreference tagging. At themoment, 50% of the 1million corpus have beenmanually annotated and consistencychecked on the tectogrammatical layer.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Announcing Prague Czech-English Dependency Treebank 2.0

We introduce a substantial update of the Prague Czech-English Dependency Treebank, a parallel corpus manually annotated at the deep syntactic layer of linguistic representation. The English part consists of the Wall Street Journal (WSJ) section of the Penn Treebank. The Czech part was translated from the English source sentence by sentence. This paper gives a high level overview of the underlyi...

متن کامل

Czech-English Dependency-based Machine Translation

We present some preliminary results of a Czech-English translation system based on dependency trees. The fully automated process includes: morphological tagging, analytical and tectogrammatical parsing of Czech, tectogrammatical transfer based on lexical substitution using word-to-word translation dictionaries enhanced by the information from the English-Czech parallel corpus of WSJ, and a simp...

متن کامل

Czech-English Dependency Tree-based Machine Translation

We present some preliminary results of a Czech-English translation system based on dependency trees. The fully automated process includes: morphological tagging, analytical and tectogrammatical parsing of Czech, tectogrammatical transfer based on lexical substitution using word-to-word translation dictionaries enhanced by the information from the English-Czech parallel corpus of WSJ, and a simp...

متن کامل

Annotation Lexicons: Using the Valency Lexicon for Tectogrammatical Annotation

We present a formalization of the valency theory (Panevová, 1974) that fits the stratificational representation scheme used in the Prague Dependency Treebank. The notion of a lexicon as a repository of “static” (invariable, or context-independent) source of information is formally presented; a different type of lexicon is used at every layer of sentence representation, with a formal link to thi...

متن کامل

Cedit - Semantic Networks Manual Annotation Tool

We present a demonstration of an annotation tool designed to annotate texts into a semantic network formalism called MultiNet. The tool is based on a Java Swing GUI and allows the annotators to edit nodes and relations in the network, as well as links between the nodes in the network and the nodes from the previous layer of annotation. The data processed by the tool in this presentation are fro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Prague Bull. Math. Linguistics

دوره 92  شماره 

صفحات  -

تاریخ انتشار 2009